The Power of Randomization: Distributed Submodular Maximization on Massive Datasets
نویسندگان
چکیده
A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We develop a simple distributed algorithm that is embarrassingly parallel and it achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.
منابع مشابه
Deterministic Algorithms for Submodular Maximization Problems
Randomization is a fundamental tool used in many theoretical and practical areas of computer science. We study here the role of randomization in the area of submodular function maximization. In this area most algorithms are randomized, and in almost all cases the approximation ratios obtained by current randomized algorithms are superior to the best results obtained by known deterministic algor...
متن کاملDistributed Submodular Maximization: Identifying Representative Elements in Massive Data
Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly larg...
متن کاملSubmodular Maximization over Sliding Windows
In this paper we study the extraction of representative elements in the data stream model in the form of submodular maximization. Different from the previous work on streaming submodular maximization, we are interested only in the recent data, and study the maximization problem over sliding windows. We provide a general reduction from the sliding window model to the standard streaming model, an...
متن کاملHorizontally Scalable Submodular Maximization
A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity – number of instances that can fit in memory – must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physic...
متن کاملInfluence Maximization with ε-Almost Submodular Threshold Functions
Influence maximization is the problem of selecting k nodes in a social network to maximize their influence spread. The problem has been extensively studied but most works focus on the submodular influence diffusion models. In this paper, motivated by empirical evidences, we explore influence maximization in the nonsubmodular regime. In particular, we study the general threshold model in which a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015